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Some areas in auditory cortex respond preferentially to sounds that 
elicit pitch, such as musical sounds or voiced speech. This study 
used human electroencephalography (EEG) with an adaptation para- 
digm to investigate how pitch is represented within these areas and, 
in particular, whether the representation reflects the physical or 
perceptual dimensions of pitch. Physically, pitch corresponds to a 
single monotonic dimension: the repetition rate of the stimulus wave- 
form. Perceptually, however, pitch has to be described with 2 dimen- 
sions, a monotonic, "pitch height," and a cyclical, "pitch chroma," 
dimension, to account for the similarity of the cycle of notes (c, d, e, 
etc.) across different octaves. The EEG adaptation effect mirrored the 
cyclically of the pitch chroma dimension, suggesting that auditory 
cortex contains a representation of pitch chroma. Source analysis 
indicated that the centroid of this pitch chroma representation lies 
somewhat anterior and lateral to primary auditory cortex. 

Keywords: electroencephalography, Heschl's gyrus, musical pitch, octave 
similarity, stimulus-specific adaptation 

Introduction 

Pitch is one of the most important perceptual features of 
sound. It conveys prosody and speaker identity in speech 
(Smith and Patterson 2005) and melody in music (Scherer 
1995), and it is one of the most important cues for segregating 
sounds from different sources in the environment (Darwin 
1997; Carlyon 2004). Most tonal sounds have temporally peri- 
odic pressure waveforms, and their pitch is determined by the 
waveform repetition rate, R (R is equal to the reciprocal of the 
repetition period, P; Fig. 1A). Thus, physically, pitch corre- 
sponds to a single, monotonic dimension ranging from low to 
high. Perceptually, however, pitch has 2 dimensions: a mono- 
tonic, "pitch height," dimension, reflecting the octave, within 
which a given note resides, and a cyclical, "pitch chroma," di- 
mension, representing the cycle of notes within each octave 
(Dowling 1999; Fig. IB). This is why music psychologists rep- 
resent pitch as a helix, with the linear dimension of the helix 
representing pitch height and the circular dimension repre- 
senting pitch chroma (Ueda and Ohgushi 1987; Fig. \C). The 
distance between each 2 points within the helix reflects the 
perceptual similarity between the corresponding notes: verti- 
cally aligned notes have the same pitch chroma (i.e. differ by 
an octave; e.g. C4 and C 5 in Fig. IO and are thus perceived as 
more similar than notes on opposite sides of the helix, which 
have the maximum possible chroma difference (i.e. a half- 
octave, or "tritone"; e.g. C 5 and F4). 

Neuroimaging studies have suggested that anterolateral 
Heschl's gyrus responds more strongly to sounds with salient 
pitch than to sounds with no or weak pitch (Gutschalk et al. 
2002; Patterson et al. 2002; Krumbholz et al. 2003; Penagos 



et al. 2004; Puschmann et al. 2010). Results from single-unit 
recordings suggest that the monkey homologue of this area 
contains neurons that are selective for pitch (Bendor and 
Wang 2005, 2010). Owing to the limitations in spatial resol- 
ution of noninvasive neuro-recording techniques, it would be 
difficult to measure cortical selectivity for pitch in humans 
using conventional stimulation paradigms. However, it has 
been suggested that paradigms based on adaptation might 
offer a means to overcome these limitations. The idea is that 
presentation of an "adapter" stimulus (A) produces a tempor- 
ary reduction in the sensitivity of neurons responsive to that 
stimulus. The neural response to a subsequent "probe" stimu- 
lus (A-P) would then be assumed to reflect the represen- 
tational similarity of the adapter and probe: a strongly adapted 
probe response would indicate that the adapter and probe are 
represented by the same, or similar, groups of neurons 
(Fig. 2^4), whereas a weakly adapted probe response would 
indicate recruitment of mostly new, or unadapted, neurons by 
the probe, and thus representation by different groups of 
neurons (Fig. 2B). This idea has been widely used to probe 
sensory representations in human cortex with psychophysics 
(e.g. Blakemore and Campbell 1969; Kay and Matthews 1972), 
noninvasive electrophysiological recording techniques (electro- 
encephalography [EEG] and magnetoencephalography [MEG]; 
e.g. Butler 1968; Naatanen et al. 1988) and, more recently, 
functional magnetic resonance imaging (fMRI; Grill-Spector 
and Malach 2001; see Grill-Spector et al. 2006, for review). 

This study used this adaptation approach to probe the 
neural representation of pitch in human auditory cortex. The 
aim was to investigate whether adaptation of auditory cortical 
responses is selective for pitch and, if so, whether the selec- 
tivity reflects the perceptual similarity between notes of 
the same chroma. Adaptation was measured with EEG. The 
adapter and probe were complex tonal sounds similar to the 
sounds produced by most musical instruments or the voiced 
portions of speech. They differed from each other in terms of 
repetition rate, R, and thus pitch. If pitch is represented in 
terms of its physical dimension (i.e. repetition rate), the probe 
response should increase monotonically with increasing pitch 
separation between the adapter and probe. However, if pitch 
is represented in terms of its perceptual dimensions (pitch 
height and pitch chroma), the function relating the probe 
response size to the pitch separation, henceforth referred to 
as "adaptation function," should be nonmonotonic, with a dip 
at octave pitch separations. For comparison, we also included 
a condition in which the adapter and probe were sinusoids 
(or "pure tones") differing in frequency. Previous neurophy- 
siological studies have assumed that the cortical coding of 
pitch involves dedicated "pitch neurons" that code pitch in- 
variant of the stimulus spectral composition (Schwarz and 
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Figure 1. Physical and perceptual dimensions of pitch. (A) Pressure waveform of middle C played on the piano. The waveform repetition rate, R, is the reciprocal of the 
waveform period, P. (6) Schematic piano keyboard (2 octaves shown), indicating the position of middle C. (C) The pitch helix, consisting of a linear, "pitch height," dimension and 
a cyclical, "pitch chroma", dimension. Two notes separated by an octave (blue line) have a lesser distance than 2 notes separated by a half-octave (red line). 
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Figure 2. Schematic illustrating the rationale of the adaptation paradigm. (A) A 
strongly adapted probe response is taken to indicate that the adapter (black) and 
probe (gray) are represented by the same, or similar, groups of neurons. (6) A weakly 
adapted probe response is taken to indicate representation by different groups of 
neurons. 



Tomlinson 1990; Fishman et al. 1998; Steinschneider et al. 
1998; Bendor and Wang 2005, 2010; but see Schnupp and 
Bizley 2010, for a recent critique of this idea). This idea has 
arisen as a result of the fact that sounds with different spectral 
compositions, and thus different "timbre," can still elicit the 
same pitch. Examples include the different vowels in speech or 
the sounds produced by different musical instruments. Dedi- 
cated pitch neurons would be expected to be similarly activated 
by pure tones as by complex tones with the same pitch. Under 
this assumption, the pure-tone condition would be expected to 
yield a similar pattern of results as the complex-tone condition. 



Materials and Methods 

Stimuli 

Each trial consisted of an adapter stimulus followed immediately (in 
order to maximize the adaptation effect) by a short (250-ms) probe 
stimulus (Fig. 5A). The stimuli were gated on and off with 10-ms 
quarter-cosine ramps to avoid audible clicks. At their transition, the 
gates were crossfaded so that the intensity envelope of the composite 
stimulus remained flat. The adapter was much longer than the probe 
(1500 ms) to allow the response to the adapter onset ("OnR" in 
Fig. 3B) to subside before the probe onset. The adapter and probe 
were either pure tones or a quasiperiodic noise, referred to as "iter- 
ated rippled noise," or IRN (Yost et al. 1996). All stimuli were pre- 
sented binaurally at an overall level of 70 dB SPL. The silent gap 
between trials was 1500 ms. 

Apart from being only partially periodic, IRN is similar to the har- 
monic complex tones used in many previous studies of pitch proces- 
sing. IRN was generated using the "add-original" procedure described 
by Yost et al. (1996). This involves mixing a sample of random (Gaus- 
sian) noise with a copy of the same noise sample, delayed by the 
"quasiperiod," P, and then iterating the process (the current study 
used 16 iterations). P is equivalent to the period in harmonic tones. 
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Figure 3. Stimulus design and average response. \fi) Example stimulus waveform. 
The adapter is plotted in black, and the probe in gray. (S) Average response across all 
participants and pitch separations. Each black line represents a recording channel and 
the response from the vertex channel (Cz) is plotted in gray. OnR, Onset Response; 
SR, Sustained Response; PR, Probe Response. 



The procedure imparts a degree of periodicity to the noise waveform, 
which gives rise to a pitch at the reciprocal of the quasiperiod, R 
(henceforth referred to as "repetition rate"). IRN has been shown to 
activate similar brain areas as harmonic tones (Penagos et al. 2004) 
but produce a stronger pitch-related response (Barker et al. 2012). 
Barker et al. raised the concern that the stronger response to IRN 
might be related to longer-term spectro-temporal modulations that are 
present in IRN but not in harmonic tones. However, these modu- 
lations could not explain transient electrophysiological (EEG and 
MEG) responses to IRN such as those measured in this study or the 
study by Krumbholz et al. (2003), because these responses set in only 
a few tens of milliseconds after the stimulus onset (see Fig. 6), at 
which point the longer-term modulations have not yet unfolded. This 
was confirmed by Steinmann and Gutschalk (2012) using MEG; they 
showed that both the transient and sustained MEG responses to IRN 
are unaffected by the IRN modulations. The adapter and probe were 
generated afresh, using new noise samples, for each trial. The rep- 
etition rate (IRN) or frequency (pure tones) of the probe stimulus had 
a nominal value of 125 or 500 Hz, respectively. These values were 
chosen to ensure that the repetition rates of the IRN stimuli were 
within the range that is relevant for music and speech, and that the 
frequencies of the pure-tone stimuli were within the hearing range. 
The exact value of the probe repetition rate or frequency was varied 
from trial to trial within a one-third-octave range around the nominal 
value to avoid across-trial adaptation to the probe. The repetition rate 
or frequency of the adapters was varied relative to that of the probe 
to vary the pitch separation between the adapter and probe. In Exper- 
iment 1, pitch separations of 0.5, 1, and 1.5 octaves were used. The 
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Figure 4. Spectral properties of IRN stimuli. IA and B) The harmonics of the adapter 
(black bars) and probe (gray bars) have greater overlap when the adapter and probe 
are separated by an octave (4) than a half-octave (B). (C-F) Resolved harmonics 
(C) stimulate separate cochlear filters and, as a result, produce peaks in the distribution 
of activity across the tonotopic array (F, green line). Multiple unresolved harmonics 
(fl) fall into each cochlear filter, producing a uniform activity distribution (F, red line). 
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Figure 5. Degree of spectral resolvability of IRN stimuli used in Experiment 1. The 
ordinate shows the number of harmonics falling into one cochlear filter as a function 
of the filter center frequency (abscissa). The dashed line shows the probe stimulus. 
The dotted and solid lines show the unresolved and resolved adapters, respectively; 
the parameter is the pitch separation from the probe (see labels on the right). The 
gray horizontal bar marks the limit of harmonic resolvability according to the criterion 
of Shackleton and Carlyon (1994; 2-3.25 harmonic per filter). Above this limit, 
harmonics are unresolved (red line segments) and below, harmonics are resolved 
(green line segments). 
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Figure 6. Average probe response to each stimulus condition in Experiment 1 . The 
arrows illustrate how the response sizes and latencies were measured. 



0.5- and 1.5-octave conditions had the same pitch chroma separation 
(a half-octave, or tritone, modulo one octave), and their average pitch 
height separation matched that of the 1-octave condition. 

IRN contains spectral peaks at frequencies corresponding to 
integer multiples ("harmonics") of the stimulus repetition rate. This 
creates potential for confound, because 2 notes separated by an 
octave have greater spectral overlap (they share every other harmonic, 
see Fig. 4A) than notes separated by a half-octave (their harmonics 
are nonoverlapping; Fig. AB). A smaller probe response for the 
1-octave than half-octave conditions might thus arise as a result of 
stronger adaptation of frequency-selective neurons, rather than selec- 
tivity to pitch chroma. However, this confound only applies when the 
harmonics are resolved by the cochlear frequency filters (i.e. the 
spacing between adjacent harmonics is greater than the widths of the 
filter tuning curves; Fig. AC). Resolved harmonics produce peaks in 
the pattern of activity across the tonotopic map (green line in 
Fig. AE), whereas unresolved harmonics (i.e. each cochlear filter re- 
sponds to multiple harmonics; Fig. AD) produce a uniform activity 
distribution (red line in Fig. AE). The frequency tuning width of the 
cochlear filters increases roughly proportionally with the filter fre- 
quency (Glasberg and Moore 1990). This means that, for a given har- 
monic sound, only harmonics up to about the 10th are resolved 
(Shackleton and Carlyon 1994). It also means that, for a given fre- 
quency band, harmonic sounds with repetition rates below about 1/ 
10th of the lower edge of the band are unresolved across the entire 
band, and sounds with repetition rates above l/10th of the lower 
edge are at least partially resolved. To investigate the effect of harmo- 
nic resolvability, we used both resolved and unresolved IRN adapters. 



Any difference in probe response size between the 1-octave and half- 
octave conditions for the unresolved IRN adapters would have to be 
assumed to reflect the properties of pitch-selective neurons. The IRN 
stimuli were bandpass-filtered between 800 Hz and 3-2 kHz using an 
eighth-order Butterworth IIR filter (which yields a — 24-dB/oct filter 
roll-off). This meant that the IRN probe was resolved within about 
the lower third of the stimulus passband (see dashed line in Fig. 5). 
The repetition rates of the adapters were either above (solid lines in 
Fig. 5) or below (dotted lines) the probe repetition rate. The adapters 
with the higher rates (+0.5, +1, +1.5 octaves) were resolved within at 
least about half of the passband, and the adapters with the lower 
rates (—0.5, —1, —1.5 octaves) were unresolved over practically the 
entire band. 

In the pure-tone conditions, which were used in Experiments 1 
and 3 (see below), adapter frequencies both above and below the 
probe frequency were used for each pitch separation (0.5, 1, 1.5 
octaves for Experiment 1, and 1.5 octaves for Experiment 3). 

Both the IRN and pure-tone stimuli were presented in a back- 
ground of masking noise, presented continuously throughout the 
data acquisition. The masker for the IRN stimuli was intended to 
prevent audible distortion products below the stimulus passband. It 
was lowpass-filtered at a half-octave below the lower edge of the 
band (i.e. 566 Hz) using an eighth-order Butterworth filter as before 
(-24-dB/oct filter roll-off) and presented at a level of 40 dB SPL per 
cochlear-filter bandwidth (as defined by Glasberg and Moore 1990). 
The masker for the pure-tone stimuli was intended to approximately 
equalize the level above detection threshold (sensation level), and 
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thus the loudness, of the stimuli across the different adapter frequen- 
cies used. It was presented at a level of 30 dB SPL per cochlear-filter 
bandwidth. 

Stimuli were generated digitally at a sampling rate of 25 kHz using 
Matlab (The Mathworks). They were digital-to-analogue converted 
with a TDT System 3 (consisting of an RP2.1 real-time digital signal 
processor and an HB7 headphone amplifier; Tucker David Technol- 
ogies) and presented through K 240 DF headphones (AKG). 



Procedure 

Experiment 1 investigated whether adaptation to pitch is sensitive to 
pitch chroma and consisted of 2 sessions, one for the IRN stimuli and 
one for the pure tones. In both sessions, stimuli were presented in 4 
approximately 20-min blocks. Each block contained 372 trials (62 for 
each pitch separation). The pitch separations were presented in 
random order. 

Experiment 2 investigated whether the nonmonotonicity of the 
adaptation functions for the IRN stimuli was due to a pitch chroma or 
a consonance effect. It consisted of a single session with 4 blocks. 
Each block contained 434 trials (31 for each of the 14 pitch separ- 
ations used) and lasted approximately 24 min. As in Experiment 1 , 
the pitch separations were presented in random order. 

Experiment 3 estimated the source locations of the probe 
responses measured in Experiments 1 and 2. The stimuli were pre- 
sented in a single session consisting of 6 blocks, 2 for the unresolved 
IRN stimuli (— 1.5-octave pitch separation between the adapter and 
probe), 2 for the resolved IRN stimuli (+ 1.5-octave pitch separation) 
and 2 for the pure tones (1 with a pitch separation of —1.5 octaves, 
and the other with a +1.5-octave pitch separation). The blocks for the 
resolved IRN stimuli and the pure tones each contained 250 trials and 
lasted approximately 14 min. To compensate for the smaller probe 
response size for the unresolved IRN stimuli, the blocks for this con- 
dition contained 400 trials and lasted approximately 22 min. The 6 
blocks were presented in a random order. 



Data Acquisition 

Auditory-evoked cortical potentials were recorded from 32 (Exper- 
iments 1 and 2) or 64 (Experiment 3) Ag/AgCl ring electrodes 
(Easycap, Herrsching, Germany). The 32 electrodes were placed ac- 
cording to the standard 10-20 arrangement Qasper 1958). The 64 
electrodes were placed according to an extended 10-20 arrangement 
that provided greater coverage of the lower half of the head surface 
("Infracerebral" cap, Easycap). In all experiments, the recording refer- 
ence was the vertex electrode (Cz) and the ground electrode was 
placed on the central forehead (AFz). Skin-to-electrode impedances 
were kept below 5 k£2 throughout the recordings. The electrode 
signals were amplified with BrainAmp DC EEG amplifiers (Brain Pro- 
ducts) and bandpass-filtered online between 0.1 and 250 Hz. The 
signals were sampled at 500 Hz and stored for offline analysis using 
the Brain Vision Recorder software (Brain Products). Participants 
watched a subtitled movie throughout the recordings to remain alert. 



Data Analysis 

The data from Experiments 1 and 2 were preprocessed using the 
EEGLAB toolbox (Delorme and Makeig 2004), which runs under 
Matlab. They were 1) lowpass-filtered at 35 Hz using a — 48-dB per 
octave zero-phase IIR filter, 2) down-sampled to 250 Hz, 3) re- 
referenced to average reference, 4) segmented into 2350-ms epochs 
ranging from 100 ms before the start of the adapter to 500 ms after 
the end of the probe, and 5) baseline-corrected to the 100-ms presti- 
mulus period. Epochs containing unusually large potentials across 
many electrodes (outside of ±3 SD) were rejected using EEGLAB's 
"joint-probability" function. This led to the rejection of an average of 
15% of epochs in Experiment 1, and 17% in Experiment 2. The re- 
maining epochs were submitted to an independent component analy- 
sis (extended infomax algorithm; Bell and Sejnowski 1995; Lee et al. 
1999) for each run and each participant separately. Components re- 
presenting eye blinks, lateral eye movements, and electrocardiac 



activity were removed by manual inspection. Epochs were then aver- 
aged for each participant and condition. The averaged responses 
were converted from sensor to source space using the Brain Electrical 
Source Analysis software (BESA, Grafelfing). The source model con- 
sisted of 2 equivalent current dipoles placed at the centroids of 
primary area TE1.0 in the left and right hemispheres (Morosan et al. 
2001). A 4-shell ellipsoidal volume conductor was used as a head 
model. The dipole orientations were fitted to the average probe 
response across conditions and participants using a time window en- 
compassing the PI, Nl, and P2 deflections (0-300 ms after probe 
onset). The resulting source model was used as a spatial filter to 
create 2 source waveforms for each condition and participant, 1 for 
the dipole in each hemisphere. The source waveforms were averaged 
across hemispheres to improve the response-to-noise ratio (none of 
the subsequent statistical analyses yielded any interaction with hemi- 
sphere; all P> 0.05). 

The data from Experiment 3 were preprocessed using BESA. For 
each participant, the data from all 6 runs were concatenated and 
searched for potentials resembling eye blinks or lateral eye move- 
ments. The potentials for the eye blinks and lateral eye movements 
were averaged separately, and their first spatial principal components 
were used to define the respective artifact topographies. In order to 
correct for ocular artifacts, the artifact topographies were incorporated 
into the subsequent source model (described in Results section). As 
for Experiments 1 and 2, the data were segmented into 2350-ms 
epochs from 100 ms before the adapter onset to 500 ms after the 
probe offset. Epochs with voltages exceeding ± 120 pV were dis- 
carded. On average, 10% of epochs were removed. The remaining 
epochs were averaged for each condition and participant. The par- 
ameters for the source modeling were the same as for Experiments 1 
and 2, apart from the dipoles being unconstrained in both orientation 
and location and the fitting being based on the individual rather than 
the grand-average responses. 

The size of the probe responses was measured, in first instance, 
using the N1-P2 peak-to-peak difference (see arrows in Fig. 6). The 
Nl and P2 deflections have opposite polarities and partly overlapping 
time courses (Naatanen and Picton 1987; Makeig et al. 1997), and 
may thus partially cancel each other. Using the peak-to-peak, rather 
than baseline-to-peak, measure of the probe response size avoids this 
cancellation from affecting the data pattern. Many of the earlier 
studies that have measured stimulus selectivity of adaptation in the 
auditory-evoked cortical potentials have taken a similar approach (re- 
viewed in Naatanen and Picton 1987). Subsequently, we also 
measured the sizes of the Nl and P2 peaks separately to examine 
whether they showed similar effects. The latency of the probe 
response was taken as the Nl peak latency. 

In Experiment 1, the effect of pitch height was tested by compar- 
ing the probe response sizes for the 0.5- and 1.5-octave pitch separ- 
ations, and the effect of pitch chroma was tested by comparing the 
response size for the 1-octave pitch separation with the mean 
response size for the 0.5- and 1.5-octave separations. 

In Experiment 2, the pitch height and pitch chroma effects were 
assessed by fitting the adaptation functions for the IRN conditions 
with a combined sinusoidal and linear function of pitch separation (red 
dashed lines in Fig. 7H, I). According to the pitch helix model, the sinu- 
soidal component represents the pitch chroma distance, and the linear 
component the pitch height distance, between the adapter and probe. 
The function was defined by P(AR) = a ■ sin tt ■ AJ? + b ■ Ai? + c, 
where P is the probe response size, AJ? is the pitch separation in 
octaves, and a, b, and c are free parameters; a and b are scaling 
factors for the sinusoidal and linear function components, respect- 
ively, and c is a constant offset. 



Participants 

Experiment 1 was conducted with a total of 15 participants (7 males, 
mean age±SD: 23-5 ±4.9 years), 6 of whom only completed the IRN 
session (4 blocks) and 5 only completed the pure-tone session 
(4 blocks). Four participants completed both sessions (all 8 blocks) 
on different days. Twelve participants (6 male, age: 22.1 ±5-6 years) 
took part in Experiment 2 (4 blocks), and 8 participants (3 male, age: 
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Figure 7. Average probe response sizes and latencies for Experiments 1 and 2. Error bars denote ± 1 standard error of the mean (SEM). IA-C) Probe response sizes for the 
pure-tone ifl), unresolved (B), and resolved (C) IRN conditions in Experiment 1, averaged across participants and plotted as a function of the pitch separation between the 
adapter and probe. (D-F) Probe response sizes as in panels A-C, but broken down into contributions from the N1 (black solid lines) and P2 (blue dashed lines) deflections. 
(G) Average probe response latencies from Experiment 1 as a function of pitch separation. (H and /) Probe response sizes for the unresolved and resolved IRN conditions in 
Experiment 2 (black lines and symbols) with function fits representing the helical model of pitch perception (red dashed lines). 



22.0 ± 2.0 years) in Experiment 3 (6 blocks). The participants in 
Experiment 2 and in the IRN session of Experiment 1 were 
nonoverlapping. 

All participants had hearing thresholds of 20 dB HL or better at 
audiometric frequencies between 250 and 4000 Hz, and had no 
history of audiological or neurological disease. The participants in 
Experiment 3 were screened for large EEG responses using a short 
(100-ms) 1000-Hz tone pip, presented at a rate of 1 per 1.5 s, as test 
stimulus. Participants with vertex Nl amplitudes of less than 7 
(using a linked-mastoid reference) were excluded. Participants gave 
written informed consent. The study procedures were approved by 
the Ethics Committee of the University of Nottingham School of 
Psychology. 



Results 

Dependence of Adaptation on Pitch Separation 

The probe response ("PR" in Fig. 5B) had a similar triphasic 
morphology as the adapter onset response ("OnR"), with a 



small initial positive peak (referred to as "PI"; Naatanen and 
Picton 1987), a large negative peak ("Nl") and another larger 
positive peak ("P2"). Overall, the pure-tone condition yielded 
the largest and earliest probe responses (black line and arrow 
in Fig. 6), followed by the resolved (blue) and then unre- 
solved (green) IRN conditions. 

A linear mixed model (LMM) analysis of the probe res- 
ponse sizes (measured as the N1-P2 peak-to-peak difference) 
for the pure-tone condition in Experiment 1 was conducted to 
test for any effects of the frequency difference direction 
(adapter above or below the probe) between the adapter and 
probe (fixed factors: frequency difference direction and 
adapter-probe pitch, or frequency, separation, entered as cov- 
ariate; random factor: participants). Although the probe 
responses were larger for adapter frequencies above than 
below the probe frequency (main effect of frequency differ- 
ence direction: Kl,43) = 55.498, P< 0.001), the frequency 
difference direction had no effect on the pattern of results 
across pitch separations (frequency difference direction by 
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pitch separation interaction: Kl,42) = 0.079, P = 0.780). An 
LMM analysis of the probe response latencies also showed 
no significant interaction or main effect of frequency differ- 
ence direction (all P>0.05). Therefore, the pure-tone probe 
responses from Experiments 1 and 3 were averaged across fre- 
quency difference direction. The size of the responses to the 
pure-tone probes increased monotonically with increasing 
pitch separation from the adapter (main effect of pitch separ- 
ation: Kl,17) = 19-548, P< 0.001; Fig. 1A). The amount of in- 
crease was similar between the 0.5- and 1-octave pitch 
separations and between the 1- and 1.5-octave pitch separ- 
ations (as shown by the normality of the residuals from the 
covariance analysis, confirmed with a Shapiro- Wilk Wtest: W 
= 0.964, P= 0.457). 

In contrast, the probe response size for the IRN conditions 
was related nonmonotonically to the pitch separation 
between the adapter and probe, with the response size for 
the 1-octave pitch separation being significantly smaller than 
the average response size for the 0.5- and 1.5-octave pitch 
separations (LMM analysis with fixed factors pitch chroma 
and spectral resolvability; main effect of chroma: F{\,28) = 
29.865, P< 0.001; Fig. IB, C\. This suggests that adaptation 
for IRN stimuli is influenced by pitch chroma, with adaptation 
being stronger when the adapter and probe have similar 
chroma and weaker when they have dissimilar chroma. Im- 
portantly, the difference in probe response size between the 
1-octave and half-octave pitch separations was similar for the 
resolved and unresolved conditions (chroma by resolvability 
interaction: Kl,27) = 0.026, P= 0.874). 

The adaptation functions for the IRN stimuli also showed 
an effect of pitch height, in that the probe responses for the 
1.5-octave pitch separation were generally larger than those 
for the 0.5-octave separation (LMM analysis with fixed factors 
pitch height and spectral resolvability; main effect of height: F 
(1,27) = 12.131, P= 0.002). The pitch height effect depended 
on the resolvability of the stimuli (height by resolvability 
interaction: Kl,27) = 6.329, P = 0.018), in that it was signifi- 
cant for the resolved (JP< 0.001), but not for the unresolved 
(P= 0.500), condition (compare Fig. IB and C). 

The response latencies for the pure-tone and unresolved 
IRN conditions were practically independent of pitch separ- 
ation (Fig. 1G). The latencies for the unresolved IRN con- 
dition were much longer than those for the pure-tone 
condition (147 vs. 101 ms, on average). The latencies for the 
resolved IRN condition were intermediate (134 ms on 
average) and varied as a function of the pitch separation, 
being similar to the latencies for the unresolved IRN stimuli at 
the 0.5-octave pitch separation (141 ms) and approaching the 
pure-tone latencies at the 1.5-octave pitch separation (128 ms). 
An LMM analysis of the probe response latencies showed a 
significant main effect of stimulus condition (F(2, 70. 959) = 
71.369, P< 0.001; P< 0.001 for all pair-wise comparisons), 
and a significant interaction with pitch separation (entered as 
covariate; K4,67.667) = 7.535, P= 0.001). 

Separate analyses of the Nl and P2 peaks showed that the 
pitch separation effect in the pure-tone condition was driven 
mainly by the Nl (main effect of pitch separation: F(l,17} = 
27.623, P< 0.001; Fig. 7D); the effect was nonsignificant for 
the P2 (Kl,17) = 0.202, P = 0.659). The pitch chroma effect 
for the IRN conditions was found in both the Nl (main effect 
of chroma: Kl,28) = 5.273, P = 0.029) and the P2 (Kl,28) = 
20.983, P< 0.001; Fig. IE, F). As for the N1-P2 difference, the 



chroma effect was independent of the adapter spectral resol- 
vability for both the Nl (chroma by resolvability interaction: F 
(1,27) = 0.285, P= 0.598) and the P2 (K1.27) = 0.086, P = 
0.771). The pitch height effect in the IRN conditions was 
driven mainly by the P2 (height by resolvability interaction: F 
(1,27) = 5.081, P= 0.033; Fig. IE, F). The effect was nonsigni- 
ficant for the Nl (Kl,27) = 0.531, P= 0.473). 

Pitch-Chroma or Musical Consonance Effect? 

The first experiment yielded nonmonotonic adaptation func- 
tions for the IRN stimuli, with a dip at the 1-octave pitch sep- 
aration compared with the 0.5- and 1.5-octave separations 
(Fig. 7B, C). Experiment 2 tested the possibility that, rather 
than reflecting a pitch chroma effect, this nonmonotonicity 
arose as a result of the octave being a consonant (i.e. "plea- 
sant"), and the half-octave a dissonant ("unpleasant"), interval 
(Schellenberg and Trehub 1994; McDermott et al. 2010). For 
that the resolved and unresolved IRN conditions were remea- 
sured with a larger set of pitch separations (6, 7, 9, 12, 13, 16, 
and 18 semitones). We also used a different group of partici- 
pants to test the robustness of the effect. The new set of pitch 
separations included perfectly consonant (perfect fifth, 
octave), imperfectly consonant (major sixth, major third 
modulo 1 octave) and dissonant intervals (tritone, minor 
second and tritone modulo 1 octave; Table 1). If the size of 
the adaptation effect is determined by the degree of conso- 
nance between the adapter and probe, the new adaptation 
functions should exhibit dips and peaks at consonant and dis- 
sonant intervals, respectively. This, however, was not the case 
(Fig. 7H, I); there was no significant difference in probe 
response size between the consonant and dissonant intervals 
(LMM analysis with fixed factors consonance and resolvabil- 
ity; main effect of consonance: HT, 154) = 1.448, P= 0.231), 
and the correlation between the probe response sizes and 
consonance ratings from McDermott et al. (2010) was also 
nonsignificant (p(l4) = - 0.299, P= 0.298). Instead, the probe 
response size was a smooth, combined function of the pitch 
chroma and pitch height separations between the adapter and 
probe. According to the helical model of pitch perception 
(see Fig. 1C7), the perceptual distance between 2 notes is a 
combined sinusoidal and linear function of their pitch separ- 
ation, with the sinusoidal component representing the dis- 
tance in pitch chroma, and the linear component representing 
the distance in pitch height. This model provided an excellent 
fit to the current data (red dashed lines in Fig. 7H, I), explain- 
ing 92.7% of variance for the resolved, and 78.5% for the un- 
resolved, IRN stimuli (see Materials and Methods for the model 
implementation). The model's sinusoidal (pitch-chroma) 
component was significant for both the resolved (F-test; F(l, 4) 



Table 1 

Adapter-probe pitch separations used in Experiment 2 



Pitch separation (semitones) 


Musical interval 


Consonance 


6 


Tritone 


Dissonant 


7 


Perfect fifth 


Perfect 


9 


Major sixth 


Imperfect 


12 


Octave 


Perfect 


13 


Minor second modulo 1 octave 


Dissonant 


16 


Major third modulo 1 octave 


Imperfect 


18 


Tritone modulo 1 octave 


Dissonant 



The second and third columns show the musical intervals constituted by the pitch separations 
and their degree of consonance. 
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= 27.560, P= 0.006) and unresolved conditions (Kl,4) = 
13.895, P= 0.020). In contrast, the linear (pitch-height) com- 
ponent was only significant for the resolved condition (XI, 4) 
= 29-055, P= 0.006), but nonsignificant for the unresolved con- 
dition (Kl,4) = 1.693, P= 0.263). This is consistent with the 
findings from Experiment 1. 

Source Locations 

The third experiment sought to estimate the source locations 
of the probe responses measured in Experiments 1 and 2. To 
maximize the response-to-recording noise ratio (required for 
accurate source localization), only the largest pitch separation 
(1.5 octaves) was used, which had yielded the largest 
responses in the first 2 experiments (Fig. 7), and a large 
number of trials were collected for averaging. The set of re- 
cording locations was also extended to cover a greater pro- 
portion of the lower half of the head surface and thereby 
facilitate source localization of activity in the region of audi- 
tory cortex (see Materials and Methods). 

Source locations of EEG responses are derived from the 
responses' voltage distributions across the head surface, re- 
ferred to as voltage maps. The voltage maps of all probe 
responses (measured over the 40-ms time window around the 
Nl peak; Fig. 8A, B) exhibited negative polarity around the 
vertex (Cz) and polarity inversion at the mastoids, indicating 
source locations in the general region of supratemporal audi- 
tory cortex. Consequently, the voltage map for each partici- 
pant and condition was fitted with a source model consisting 
of 2 equivalent current dipoles, which were unconstrained in 
both location and orientation (Fig. 8(T). Each dipole models 
the neural activity in a circumscribed region of cortex (in this 
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Figure 8. Voltage distribution maps and dipole source models for the probe 
responses from Experiment 3. (4 and B) Voltage maps of one representative 
participant for the pure-tone 14) and unresolved IRN conditions (6). (C) Dipole source 
models for the pure-tone (yellow) and unresolved IRN (red) conditions, averaged 
across participants and projected onto sections of the MNI (Montreal Neurological 
Institute) template brain. Ellipses show 95% confidence intervals for the dipole 
locations, computed using a bootstrapping procedure. The dipoles for the resolved 
IRN condition are not shown to avoid cluttering. The locations and orientations of the 
resolved IRN dipoles fell between those for the pure-tone and unresolved IRN 
conditions. 



case, supratemporal auditory cortex in the left or right hemi- 
sphere). The dipole location reflects the centroid of the active 
region and the dipole orientation the direction of its net 
current flow (Scherg 1990). Two of the fitted dipoles were 
located at the boundary of the head model and were excluded 
from subsequent analysis. The locations and orientations of 
the remaining dipoles (goodness of fit > 98%) were submitted 
to a permutation procedure (with 1000 resamples; Efron and 
Tibshirani 1993) to test for differences between the stimulus 
conditions. 

The dipoles for the pure-tone condition were located on 
medial Heschl's gyrus (Talairach coordinates [left/right]: 
-41.9, -18.8, 15.8/44.2, -13.4, 13.4 mm), close to the cen- 
troid of primary area TE1.0 (Morosan et al. 2001). The largest 
differences in source location were observed between the 
pure-tone and unresolved IRN conditions. The Euclidean dis- 
tance between the dipole locations for these conditions was 
significant in both hemispheres (left: P = 0.024, right: P = 
0.047). Compared with the dipoles for the pure-tone con- 
dition (shown in yellow in Fig. 8C7), the dipoles for the unre- 
solved IRN condition (shown in red) were located 7.2 mm 
more lateral in the left hemisphere and 7.9 mm more anterior 
in the right hemisphere (Talairach coordinates [left/right]: 
-49.1, -21.2, 17.2/42.9, -5.5, 17.6 mm). Permutation tests 
showed that both differences were significant (left: P< 0.001, 
right: P= 0.026). Differences in dipole orientation were ana- 
lyzed by treating the dipoles as vectors in the 3-dimensional 
unit sphere. The angle subtended by the arc between the 
vector endpoints is the "central angle." The central angle 
between the dipole orientations for the pure-tone and unre- 
solved IRN conditions was significant in both hemispheres 
(left: P< 0.001; right: P = 0.040). This was due to the sagittal 
and transversal projections of the dipoles for the unresolved 
IRN condition being significantly more forward pointing than 
those for the pure-tone condition (sagittal [left/right]: P = 
0.014/0.016; transversal [left/right]: P= 0.013/0.037). The 
dipole locations and orientations for the resolved IRN con- 
dition lay between those for the pure-tone and unresolved 
IRN conditions (Talairach coordinates [left/right]: —47.4, 
-21.9, 17.4/43.0, -5.3, 17.1 mm). Note that the reported Ta- 
lairach coordinates are based on standard electrode pos- 
itions and should thus be viewed as approximations. This 
does not, however, affect the observed differences between 
conditions. 



Discussion 

This study used an adaptation paradigm with EEG to investi- 
gate whether the pitch of complex tonal sounds, such as 
voiced speech, or music, is represented by its physical dimen- 
sion (i.e. the waveform repetition rate) or by its perceptual 
dimensions (pitch height and pitch chroma) in human audi- 
tory cortex. The adaptation approach is based on the assump- 
tion that those neurons that respond most strongly to the 
adapter are also most adapted by it (see Grill-Spector et al. 
2006, for review). According to this assumption, the amount 
of adaptation for a given adapter and probe should be deter- 
mined by the overlap between, and thus the selectivity of, 
their neural representations. The most important finding of 
this study was that the adaptation functions for the IRN 
stimuli were nonmonotonic, with adaptation being stronger 
(i.e. the probe response being smaller) when the adapter and 
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probe were separated by an octave than a half-octave, or 
tritone. This suggests that a note and its octave share greater 
overlap in neural representation than a note and its half- 
octave. Experiment 2 ruled out the possibility that this non- 
monotonicity was due to the octave being a more consonant 
interval than the half-octave, indicating that it represents a 
true pitch chroma, rather than a consonance, effect. Impor- 
tantly, the effect was as large for the unresolved IRN stimuli 
as for the resolved stimuli. As unresolved stimuli produce a 
uniform activity distribution across the tonotopic array, this 
rules out the possibility that the pitch chroma effect was due 
to there being greater harmonic overlap between notes with 
similar than dissimilar pitch chroma. These results suggest 
that human auditory cortex contains neurons that are selective 
for pitch chroma. 

It is unlikely that the probe responses reflect processes in- 
volved in auditory deviance detection. The adapters and 
probes were presented with equal probability, and each 
probe was preceded by only a single adapter. Previous work 
suggests that such conditions are ineffective in eliciting the 
predictive processes that are thought to underlie the auditory 
deviance, or mismatch, response (e.g. Sams et al. 1983; 
Cowan et al. 1993; Winkler et al. 1996). Predictive processing 
would be expected to depend on the perceptual dissimilarity 
between the adapter and probe. Butler (1968) and Megela 
and Teyler (1979) found that adaptation in the N1-P2 ampli- 
tude is inconsistent with this expectation. They showed that a 
loud adapter is more effective at suppressing the response to 
a quiet probe than vice versa, despite the perceptual dissimi- 
larity between the adapter and probe being the same in both 
cases. Results by Wacogne et al. (2011) suggest that predictive 
processing related to the auditory deviance response involves 
areas in frontal and other associative cortices. In this study, 
nonauditory contributions to probe responses were marginal; 
a principle component analysis within a time window encom- 
passing all 3 deflections (PI, Nl, and P2) of the grand-average 
probe response in Experiment 3 showed that a single spatial 
component explained over 98% of the variance in that 
response. This is consistent with Garrido et al.'s (2007) 
finding that top-down modulation of auditory-evoked cortical 
responses from nonauditory sources only becomes apparent 
after about 220 ms into the response. 

In contrast to the responses to the IRN stimuli, the pure- 
tone responses increased monotonically with increasing pitch 
(or frequency) separation between the adapter and probe. 
This suggests that they were produced by different generators. 
The fact that the pure-tone responses occurred at much 
shorter latencies than the IRN responses suggests that they 
were generated at a lower processing level. The source analy- 
sis results from Experiment 3 indicated that the source of the 
pure-tone responses was centered on medial Heschl's gyrus, 
suggesting that the responses were generated in primary audi- 
tory cortex. Primary auditory cortex is known to contain a 
topographic representation of frequency (referred to as "tono- 
topic map"; Formisano et al. 2003; Talavage et al. 2004). The 
fact that the size of the pure-tone responses increased linearly 
with increasing frequency separation "in octaves" suggests 
that the gradient of the tonotopic map in human primary 
auditory cortex, like that of the cochlear tonotopic map, rep- 
resents logarithmic frequency. 

The source of the IRN responses was located somewhat 
anterior and lateral to the source of the pure-tone responses, 



suggesting that it was part of a network of nonprimary areas 
identified as being specifically sensitive to pitch or pitch 
change (melody) by previous neuroimaging and neurophysio- 
logical studies (Patterson et al. 2002; Penagos et al. 2004; 
Gutschalk et al. 2004; Bendor and Wang 2005, 2010; Hall 
et al. 2006; Puschmann et al. 2010). Our results suggest that 
this network contains a parametric representation of pitch 
chroma. This is consistent with the finding by Warren et al. 
(2003) that a region anterior to Heschl's gyrus responded 
more strongly to tonal sequences that changed in pitch 
chroma than pitch height. 

Although the resolved and unresolved IRN responses 
showed a similar pitch chroma effect, it is likely that the 
resolved responses constituted a mixture of contributions 
from both the pitch-sensitive nonprimary source and the 
frequency-selective primary source. This is suggested by 
the intermediate response latencies and source locations for 
the resolved IRN condition. The variation in response latency 
with pitch separation suggests that the relative proportions of 
the primary and nonprimary contributions to the resolved 
IRN responses varied as a function of pitch separation, with 
the primary contribution increasing with increasing pitch sep- 
aration (and thus spectral dissimilarity) between the adapter 
and probe. The same mechanism probably also accounts for 
the pitch height effect observed in the resolved IRN condition 
(i.e. the fact that the response was larger for the 1.5- than 
0.5-octave pitch separation). 

The absence of any pitch chroma effect in the pure-tone 
condition suggests that, despite eliciting pitch, the pure tones 
did not evoke any notable response from the pitch-sensitive 
nonprimary source. This runs counter to the idea that audi- 
tory cortex contains dedicated pitch neurons that code pitch 
invariant of the stimulus spectral properties, or timbre. Invar- 
iant pitch neurons would have been expected to respond to 
IRNs and pure tones alike. Our results are consistent with the 
results from a seminal study by Butler (1972), who showed 
that a pure tone with a given pitch does not adapt the 
response to a complex tone with the same pitch but nonover- 
lapping spectral composition. The results of Butler's study 
and our results imply that pure tones and complex tones acti- 
vate different neurons in auditory cortex. This is consistent 
with the finding that pure tones are an inefficient stimulus for 
driving nonprimary auditory neurons (Schreiner and Cynader 
1984; Rauschecker et al. 1995; Wessinger et al. 2001; Hall 
et al. 2002) as well as the failure by previous neurophysiologi- 
cal studies to find neurons in primary auditory cortex that 
respond to the pitch of complex tones with frequency com- 
ponents outside of the neurons' frequency response areas 
(Schwarz and Tomlinson 1990; Fishman et al. 1998; Steinsch- 
neider et al. 1998). Bendor and Wang's (2005, 2010) studies 
represent an exception to this failure, but there is still some 
uncertainty as to whether their results can be attributed to dis- 
tortion products, which arise as a result of the nonlinearity of 
cochlear processing (McAlpine 2004; Abel and Kossl 2009). 

Taken together, the current and previous results suggest 
that in mammalian auditory cortex, pitch is corepresented to- 
gether with the stimulus spectrum (or timbre), rather than 
being represented separately in a dedicated map. Recent 
studies by Nelken et al. (2008) and Bizley et al. (2009) 
suggest that other sound features, such as spatial location, 
may also be included in this representation. Their results 
showed that most neurons in both primary and nonprimary 
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auditory fields are sensitive to specific combinations of pitch, 
timbre, and location. This is similar to the visual cortex, 
where neurons represent combinations of features such as 
retinal location, orientation, and ocular dominance (Hubel 
and Wiesel 1977). It is possible that like the coding of certain 
visual features, such as faces or objects, pitch coding might 
become more specialized, and thus invariant to other features, 
at higher levels of processing. These levels might lie beyond 
the levels that generate the Nl and P2 deflections measured 
in this study. Alternatively, their activation might occur only 
under active listening conditions. 

Although both the Nl and P2 showed the pitch chroma 
effect for the IRN stimuli, only the Nl showed the frequency 
separation effect for the pure tones. The Nl and P2 covary 
along many stimulus dimensions, which is why they have 
often been treated as a unitary phenomenon. However, the 
Nl and P2 are affected differently by attention and sleep, 
show different maturational time courses and have somewhat 
distinct topographies. This suggests that their generators are 
at least partially separate (see Crowley and Colrain 2004, for 
review). Epicortical and intracortical recordings in the rat 
auditory cortex suggest that the Nl reflects responses to 
frequency-selective thalamocortical input, whereas the P2 is 
generated by more widespread corticocortical connections 
(Barth and Di 1990; Barth et al. 1993). This may explain why 
only the Nl, but not the P2, showed the frequency separation 
effect in the pure-tone condition. 

A recent study by Baumann et al. (2011), which used fMRI 
to investigate pitch mapping in macaque monkeys, found that, 
at the level of the inferior colliculus (IC), pitch is mapped 
monotonically, with the represented pitch changing progress- 
ively from one end of the map to the other. This suggests that 
the IC represents the physical dimension of pitch (waveform 
repetition rate). Our finding that adaptation in auditory cortex 
shows selectivity for pitch chroma suggests that, at the level of 
cortex, the pitch map is circular rather than monotonic. For 
instance, it might resemble the pinwheel map of image orien- 
tation in visual cortex, where adjacent orientations are ar- 
ranged like spokes around a central point (Bonhoeffer and 
Grinvald 1991). Circularity would ensure that the map is 
locally smooth, that is, nearby neurons share similar response 
properties. Local smoothness represents a key principle in the 
formation of cortical sensory maps (Swindale 1996). 

Neurons that are selective for pitch chroma might underlie 
the perception of melody in music. Interconnection between 
different chroma-selective neurons might create sensitivity to 
common musical intervals. Evidence for such sensitivity has 
been found in neurophysiological recordings from the cat and 
monkey auditory cortex (Brosch et al. 1999; Brosch and 
Schreiner 2000). 

The pitch adaptation paradigm developed in this study 
could be used to investigate the neural correlates of amusia. 
Congenital amusics make up about 4% of the general popu- 
lation. They have sometimes severe, lifelong difficulties in ap- 
preciating and producing music (Kalmus and Fry 1980). The 
causes of amusia are still a subject of debate (see Peretz and 
Hyde 2003, for review). The current paradigm yields a direct 
measure of neural pitch representation, unconfounded by 
task requirements, and might thus provide a tool for investi- 
gating whether amusia stems from a problem with the basic 
representation of pitch as opposed to the more general cogni- 
tive processes involved in music perception. 
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